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Abstract — In this paper, we propose a path following replicator dynamic, and investigate its potentials in uncovering the underlying 
cluster structure of a graph. The proposed dynamic is a generalization of the discrete replicator dynamic, which evolves to a local 
maximum of the optimization problem max/(x) = x^Vl^x,x e A (W \s a non-negative square matrix and A is a simplex), and has 
been widely used to stimulate the evolution process of animal behavior. The discrete replicator dynamic has been successfully used to 
extract dense clusters of graphs; however, it is often sensitive to the degree distribution of a graph, and usually biased by vertices with 
large degrees, thus may fail to detect the densest cluster. To overcome this problem, we introduce a dynamic parameter e, called path 
parameter, into the evolution process. That is, we successively solve a series of optimization problems, max /(x) = X^H^X, 
where = {x| J^i^i = l,xi e [O,^]}. The path parameter e can be interpreted as the maximal possible probability of a current 
cluster containing a vertex, and it monotonically increases as evolution process proceeds. By limiting the maximal probability, the 
phenomenon of some vertices dominating the early stage of evolution process is suppressed, thus making evolution process more 
robust. To solve the optimization problem with a fixed e, we propose an efficient fixed point algorithm. Intuitively, the proposed dynamic 
follows the solution path of the optimization problems max/(x) = x^Vl^x,x g A^, with gradually expanding domain Ag. The key 
properties of the proposed path following replicator dynamic are: 1) its probability to evolve to the most significant cluster of graph is 
much higher than discrete replicator dynamic, 2) the path parameter e offers us a tool to control the evolution process, and we can use 
it to simultaneously obtain dense subgraphs of various specified sizes, and 3) the evolution process is essentially a shrink process of 
high-density regions, thus reveals the underlying cluster structure of graph. The time complexity of the path following replicator dynamic 
is only linear in the number of edges of a graph, thus it can analyze graphs with millions of vertices and tens of millions of edges on a 
common PC in a few minutes. Besides, it can be naturally generalized to hypergraph and graph with edges of different orders, where 
/(x) becomes a polynomial function. We apply it to four important problems: maximum clique problem, densest /c-subgraph problem, 
structure fitting, and discovery of high-density regions. The extensive experimental results clearly demonstrate its advantages, in terms 
of robustness, scalability and flexility. 

Index Terms — Cluster, Replicator Dynamic, Path Following Replicator Dynamic, Dense Subgraph, High-density Region 



1 Introduction 

REplicator dynamic (H is a deterministic monotone 
non-linear game dynamic used in evolutionary 
game theory Q, a field which models the evolution pro- 
cess of animal behavior using game theory. Considering 
a large population of individuals which compete for a 
particular limited resource, each individual can choose 
one strategy from a set of predefined strategies. Let 
/ = {1, . . . , n} be the set of pure strategies and let Xi{t) 
be the probability of the population members playing 
strategy i at time t. The state of the system at time t 
is represented by a vector x(t) = . . . , Xn{t))^ . Let 

W = (wij) be the n x n payoff matrix. Specially, for 
each pair of strategies i,j G /, Wij represents the payoff 
of an individual playing strategy i against an opponent 
playing strategy j. Without loss of generality, we shall 



assume W is nonnegative, i.e., Wij > for all i,j e I. 
The discrete replicator dynamic (DRD) can be expressed 
in the following form ||T|: 



x^(t)Wx(t) 



(1) 



It is well known that ([T]) is a growth transformation 
for the following optimization problem ||3l : 



max/(x) = x"^Wx, X G A, 



(2) 
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where A = {x| = 1, G [0, 1]} is a simplex. In fact, 

the sequence {x(t)} converges to a local maximum of Q 
along an increasing trajectory [31 . 

For each graph G = {V^E^ W), we can regard it as a 
game, with each vertex v e V representing a strategy, 
and the weighted adjacency matrix of graph G being 
the payoff matrix W. Then the evolution process ^ 
can be interpreted as a cluster extraction process (H. 
X is a soft indicator vector of the cluster Cx, with Xi 
being the probability of the cluster Cx to contain vertex 
Vi. The objective function /(x) in is a measure of 
compactness of the cluster Cx, and each local maximum 
of © represents a potential dense cluster on G O, ||6|. 
Many previous works describe the relations between 
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local maxima of (0 and clusters on G ||6|, Q, III- For 
example, the well-known Motzkin-Straus theorem ||7| 
states that for unweighted graph, the global maximum 
of (|2| corresponds to the maximum clique on G. 

Despite its success in cluster extraction, discrete repli- 
cator dynamic is limited in several aspects. First, al- 
though it evolves to a maximum of (0, this maximum 
may be not the global maximum of (|2|. In fact, the 
evolution process ^ is very sensitive to the degree 
distribution of graph G, and the vertices with high 
degrees usually bias the evolution process |[Z|. Second, 
the clusters corresponding to the local maxima of (|2| 
are often extremely compact, such as the cliques on 
unweighted graphs ||9|. In some applications, this may be 
a problem, since the desired clusters are not so compact. 
In such situations, we may want to control the sizes of 
detected clusters. Third, there may be multiple dense 
clusters on graph, but discrete replicator dynamic only 
reveals one of them. For example, in a social network 
graph, there are usually multiple communities. These 
communities are relatively compact, but not as compact 
as cliques. Ideally, we need an algorithm to automatically 
reveal all these communities. 

To fulfil these practical needs, in this paper, we pro- 
pose a new replicator dynamic, called path following 
replicator dynamic (PFRD). Our proposed dynamic is 
based on the following observation: the sensitivity of the 
discrete replicator dynamic ^ to the degree distribution 
of graph G is mainly caused by the fact that vertices with 
high degrees often have too large values in x at the early 
stage of evolution process. As a consequence, these ver- 
tices may dominate the replicator process. For example, 
in the maximum clique problem, if the initialization is 
x(0) = {-,..., ^}, which is commonly used [9j, then 
x(l) = . . . , ^}, where di is the degree of Vi and 
(ij] = Y^^^i di. If a vertex, say Vi, has a very large degree, 
then xi(i) will be very large, and the replicator process 
will be greatly biased and most likely to converge to 
a clique containing Vi. If Vi does not belong to the 
maximum clique, then the discrete replicator dynamic ([ij 
converges to a local maximum, not the global maximum 
of ©. 

We propose to overcome this problem by introducing 
a parameter e, called path parameter, into the evolution 
process. Mathematically speaking, we sucessively solve 
a series of optimization problems in the following form, 
with monotonically increasing e: 

max / (x) = x^ W^, x G . (3) 

X 

where = {x| ^1^=1 = G [0,6:]} is a subset of 
the simplex A. 

Due to the constraint Y^^^iXi = 1, the minimal value 
of £ is ^ . Theoretically, e can continuously increase from 
^ to 1. In our implementation, we samples m discrete 
values within the range [^,1], which form a set <l> = 
{ei,...,em}, with si = ^, Sm < 1/ and Si < e^+i for 
alH = 1, . . . , m — 1. When e = -, the maximizer of (O 



is X = . . . , ^}. In our approach, the local maximizer 
of © at £ = serves as the initialization of © at e = 
Sij^i. In this way, x follows the solution path of ^ as 
e increases from ^ to e^. The choice of Sm depends on 
the applications. When e^ = l, we get a local maximizer 
of ©; however, due to the path parameter e, this local 
maximizer is more likely to be the global maximizer of 
0, as explained later and verified by experiments. 

The proposed approach is a direct generalization of 
discrete replicator dynamic, and it has several advan- 
tages. First, since e gradually increases from ^ io Smr 
at early stage of evolution process, no vertex has large 
values. Thus, the evolution process is not sensitive to 
the degree distribution and mainly depends on the 
overall structure of graph. When = 1, the evolution 
process has much higher probability to converge to the 
global optimum of 0, compared with discrete replicator 
dynamic. From another prospective, the path param- 
eter e prohibits that some components of x suddenly 
increase from small values to very large values, which 
is also reasonable for simulation of animal behavior. 
As a whole, animals change their behaviors gradually, 
not suddenly. Thus, the proposed dynamic may better 
stimulate the evolution process of animal behaviors. 
Second, the parameter e offers us a flexible tool to control 
the evolution process. Since Xi < e and Xir=i ~ 
there are at least [^] nonzero components in x, where 
[^] represents the smallest integer larger than or equal 
to ^. In other words, the cluster Cx contains at least 
[^] vertices. Hence, we can control the size of obtained 
clusters. For example, if we want to get a dense cluster 
with k vertices, then we can simply set = p A typical 
application of this property is the densest A;-subgraph 
problem (D/cS) iFTOl . Note that some components of x 
may be very small, thus, in our implementation, we use 
a threshold 5i to judge whether a vertex belongs to Cx or 
not. Specifically, we define Cx = {vi\xi > Si}, where Si is 
a small constant whose value is specifically decided for 
different applications. Third, since the evolution process 
mainly depends on the overall structure of graph, it 
can robustly reveal the underlying cluster structure. In 
fact, the evolution process can be regarded as a gradual 
simplification process of graph G, and as e increases, 
vertices which have relatively weak connections with 
current graph shall be dropped. Clearly, this is essen- 
tially a shrink process of high-density regions in the data, 
and the dropped vertices can be considered as outliers. 
Note that discrete replicator dynamic does not have such 
properties, since it is sensitive to degree distribution. 
Fourth, the proposed approach is very efficient, with 
linear time complexity in the number of edges. Thus, 
we can efficiently analyze the cluster structure of very 
large graphs, such as network graphs. 

Fig. 1 demonstrates the evolution processes of both 
discrete replicator dynamic and path following replicator 
dynamic on an unweighted graph G. The components of 
X is in accordance with the lexicographical order of the 
vertices, that is, xi represents a, X15 represents o, and so 



(a) (b) (c) (d) (e) 

Fig. 1. Evolution processes of discrete replicator dynamic and the proposed path following replicator dynamic, (a) 
A graph with 15 vertices, with a clique of size 4, {a,b,c,d}, and a clique of size 3, {e,f,g}. (b) Evolution process of 
discrete replicator dynamic. Vertices f and g have relatively large degrees, and thus have relative large values xe and 
X7 at the beginning of evolution process. The process wrongly evolves to a local maximum of Q, which corresponds 
to the clique {e,f,g}. (c) Evolution process of path following replicator dynamic. The components of x are constrained 
by the path parameter, thus no vertex can dominate the evolution process at early stages, and the process correctly 
converges to the global maximum of dSji, which corresponds to the maximum clique {a,b,c,d}. The evolution process 
of our approach also clearly reveals the cluster structure of this graph: the vertices h, i, j, k, I, m, n and o disappear 
first, then the vertices in the clique {e,f,g}, finally only the vertices in the maximal clique remains. For comparison, the 
evolution processes of components corresponding to nodes a and f in discrete replicator dynamic and our approach 
are shown in (d) and (e), respectively. 



on. According to the Motzkin-Straus theorem, the global 
maximum of (O corresponds to the maximum clique, 
{a,b,c,d}. However, due to sensitivity to degree distribu- 
tion, the discrete replicator dynamic wrongly converges 
to a local maximum of (0 corresponding to the clique 
{e,f,g}. In comparison, our algorithm evolves to the 
global maximum of (0 and correctly finds the maximum 
clique. In the evolution process of discrete replicator 
dynamic, xq and xr have relatively large values after 
the first iteration since f and g have more neighbors. 
Hence, the evolution process is biased by f and g^. The 
value of Xq as a function of t is shown in (d). Obviously, 
it increases quickly in the first few iterations. In the 
path following replicator dynamic, all components of x 
change more smoothly, e.g. xi and xq shown in (e). 

The rest of the paper is organized as follows. We first 
review the related work in Section 2. In Section 3, a 
fixed point method is introduced to efficiently solve ©, 
and the sampling strategy for path parameter is also 
discussed. The experimental results on four problems, 
namely, maximum clique problem, densest /c-subgraph 
problem, structure fitting and discovery of high-density 
regions, are demonstrated in Section 4. In Section 5, the 
conclusive remarks are made. 

2 Related Work 

Cluster analysis is a basic problem in various disciplines 
(m, such as pattern recognition, data mining and com- 
puter vision, and a huge number of such methods have 
been proposed. It is beyond the scope of this paper to 
list all of them, therefore, we focus on methods closely 
related to ours. 

1. Such phenomenon is more obvious and dominant on large graphs. 



The discrete replicator dynamic has been used for 
cluster extraction for a very long time, probably dating 
back to the well-known work of Motzkin and Straus |[7|, 
which relates the global maximum of (O with the max- 
imum clique of graph. The maximum clique problem 
m is a very fundamental problem in computer science, 
but it is NP-complete [7]. According to the Motzkin- 
Straus theorem, we can get the maximum clique by 
solving (O. In [8 J, the discrete replicator dynamic has 
been used to extract cluster on weighted graph. The 
obtained cluster, called dominant set, is a generalization 
of the concept of maximal clique. To efficiently enumer- 
ate dense clusters on graph, a fast algorithm has been 
proposed in |6l, which evolves to a dense subgraph by 
iteratively shrinking and expansion. In the shrink stage 
of this method, the discrete replicator dynamic is used 
to extract dense clusters. The clustering methods based 
on replicator dynamic have been successfully applied to 
many tasks, such as image segmentation ||8|, point set 
and image matching HI, ||5l, ||6|, (T2|, |[T3ll and stereo 
correspondence [14l. All these methods rely on discrete 
replicator dynamic to get the global optimum of 0; 
however, discrete replicator dynamic ^ is very sensitive 
to the degree distribution of graph G, and usually does 
not evolve to the global maximum of 0. Since our 
approach has a much higher probability to evolve to the 
global maximum of than discrete replicator dynamic, 
it is more suitable for these applications. 

Methods which try to extract dense clusters of certain 
sizes are also related to the proposed approach. For 
example, the methods to solve the densest /c-subgraph 
problem (DkS) ||10|, (151 . In hypergraph clustering, Liu 
et al. dH proposed a method to control the least size 
of extracted clusters, and showed state-of-the-art results. 
Utilizing the fact that dense clusters are very robust. 
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they also generalized kNN and proposed an alternative, 
kDN fTT]. kDN is in fact a dense cluster of size k 
which has a strong connection with an object. Obviously, 
our approach can be used to solve these problems. In 
fact, our approach is more robust. In our approach, the 
parameter e controls the least size of obtained clus- 
ters. Moreover, our approach can simultaneously and 
robustly find dense clusters of different sizes in one run 
of evolution process. 

Since dense subgraphs correspond to high-density re- 
gions in the data, the evolution process of path following 
replicator dynamic can be considered as a shrink process 
of high-density regions. The task of estimating high- 
density regions from data samples is a fundamental 
problem in a number of works, such as outlier detection 
and cluster analysis IITSl , |[T9l . The advantage of our 
method for this task is that our method can gradually 
reveal the landscape of multiple high-density regions of 
various shape at different scales. High density regions 
usually represent modes of data, and in this sense, our 
method is also closely related to mode-finding methods, 
such as mean shift ll2Ql . 

3 Algorithm 

The central part of path following replicator dynamic is 
to efficiently solve ©. We first analyze the properties of 
the solution, then present our algorithm. 

3.1 Properties of Local Maximizers 

In |[T6l , the properties of local maximizers of (0 have 
been analyzed. Here we give a brief summary. 

By adding Lagrangian multipliers A, /ii, • • • , /in/ Mi > 
and , • • • , i^n/ > for alH = 1, • • • , n, we can obtain 
the Lagrangian function of ©: 

n n n 

L{^,\lj) = f{x) - A(^ Xj-l)^^ lljXj ^^Vi{£-Xi). 

i=l i=l i=l 

(4) 

According to Karush-Kuhn-Tucker (KKT) condition [ZTj, 
if X* is a local maximizer of (|3]), then 

{^(x) - A + /ii - = 0, z = l,---,n, 
EliM.<=0, (5) 

^(x) represents the partial derivative of /(x) with 
respect to Xi, and the partial derivatives of /(x) with re- 
spect to all components of x form a vector ^(x) = §|(x). 
Since x*, /i^ and are nonnegative for alH = 1, • • • , n, 
ZliLi l-^i^i = ^ is equivalent to saying that if x* > 0, 
then jj^i = 0, and Yl^^i ~ ^i) = is equivalent to 
saying that if < e, then = 0. Hence, based on 
simple algebraic calculations, the KKT conditions can be 
rewritten in the following form: 

-^(x)<^ =A, 0<xt<e; (6) 
[ > A, X* = e. 



Any point satisfying the KKT condition ([6| is called a 
KKT point. Since KKT condition is a necessary condition, 
a local maximizer of © is also a KKT point. However, 
a KKT point may be not a local maximizer of (|6|. 

The KKT condition ([6| is important for our algorithmic 
design. It has an intuitive geometric meaning: the partial 
derivatives with respect to all variables in the range 
(0, have the same value A, the partial derivatives with 
respect to variables having value should be not larger 
than A, and the partial derivatives with respect to all 
variables having value e should be not smaller than A. 

3.2 Truncated Simplex Projection 

In this section, we propose an efficient algorithm to solve 
The discrete replicator iteration ^ can be rewritten 
into the following form: 

x(t + 1) = proj^(x(t) ^(x(t))), (7) 

where stands for element-wise multiplication, and 
proj^(y) is a projection, which projects a nonnegative 
vector y onto the simplex A = {x|xi > 0, Yl^^i Xi = 1} 
by £i normalization. That is, 

proj^(y)i = . (8) 

This projection is called simplex projection, and its main 
characteristic is: all components of y scale uniformly. 

For the problem 0, the feasible region of x is = 
{x.\xi e [0,6], Xir=i = ^}}' ^ subset of A. Obviously, 
simplex projection cannot be used here, since some com- 
ponents may exceed e after projection. To overcome this 
problem, a natural idea is: if some components exceed e 
after projection, then we set their values to be e and scale 
other components uniformly, this process iterates until 
no component is larger than e after projection. The new 
projection proj^ (y), called truncated simplex projec- 
tion, is a direct generalization of the simplex projection. 

Suppose U contains all components of y whose values 
should be set to e after projection and V = I/U, then 
the truncated simplex projection can be mathematically 
described in the following form: 

proj^^(y), = | iinlk \ly^ (9) 

where \U\is the cardinality of set U. Obviously, U should 
satisfy two criteria: 1) the components of y in are 
larger than the components of y in V, and 2) no element 
in U can be moved to V. In ©, all components in U are 
set to £, and other components are scaled to make the 
sum of all components equal to 1, thus the scaling factor 



The algorithm to compute the truncated simplex pro- 
jection of a nonnegative vector y is summarized in Alg. 
1. The critical part is to determine the set U. Since the 
components of y in are larger than the components 
in V , we first sort all components of y in descending 
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Algorithm 1 Truncated Simplex Projection proj^ (y) 

1: Input: The vector y and the path parameter e. 

2: Sort the components of y in descending order, 

3: Compute z = ^^^^ yi. 
Set /7 = 0. 
for z = 1 , . . . , n do 

Compute X = If x > set U = U U 

{si} and z = z — Hsi) Otherwise, break. 
7: end for 

8: Set V = I/U. For all i e U, set proj^ (y)^ = e. For 
all i e V, set proj^^ (y), = MldlM. 



9: Output: y' = proj^ (y). 



order, that is, {^^i, • • • , ^s^}, with ys, > ysi+, for all 
i = l,...,n — 1, and then check them one by one, 
from ys^ to ys^. When we check the i-th component 
^g., the first i — 1 components are all in U and the 
values of these components should be set to e after 
projection, then the sum of other n — i + 1 components, 
from the i-th component to the n-th component, should 
be 1 — (i — 1)6: after projection. Since the sum of these 



1 components before projection is 2; = 



then the scale factor is liziliiDl^ if the value of the i-th 
component is smaller than e after projection, then we 
have already found all elements in U ; otherwise, we put 
the i-th component in U and check next component ysi+i • 
It is easy to verify that: 1) proj^ (y) is a nonnegative 
vector, and the sum of all its components is equal to 1, 2) 
proj^ (y)i < e for alH = 1, . . . , n, and 3) if yi > yj, then 
proj^ (y)i > proj^ (y)j. Clearly, the simplex projection 
proj^(y) used in discrete replicator dynamic is in fact a 
special case of the truncated simplex projection proj^ (y) 
at 5 = 1. 

3.3 Fixed Point Iteration 

Based on truncated simplex projection, we can define an 
iteration similar to (O: 



x(t + 1) 



(10) 



proj^,(xW0^(xW))- 

Obviously, this iteration ensures that x always stays in 
the region A^. 

For simplicity of notation, we define an operator 
Pe{^) = proj^ (x ^(x)). Then the iteration (p^ can be 
simply expressed in the following way: 



x(t + l) = P,(x(t)). 



(11) 



From an initialization x(0), we can repeat this iteration 
until a fixed point of ^^(x) is obtained. The following 
two theorems build the relation between the fixed points 
of the operator ^^(x) and the KKT points of ©, and 
they form the theoretic foundation of path following 
replicator dynamic. 

Theorem 1. The KKT point x of (|3]) is a fixed point of 
the operator ^^(x). 



Proof: Since x is a KKT point of ©, according to the 
KKT condition we get: 



< A, 
= A, 
> A, 



Xi = 0; 
< ij < £; 

S . 



(12) 



Let Ui = {i\xi = £}, U2 = {i\xi e (0,6:)}, U3 = {i\xi = 0} 
and x' = ^^(x). We only need to prove that U = Ui 
in Alg. 1. This is because when U = Ui, we can get: 1) 
when i e Ui, according to Alg. 1, = e, 2) when i e U3, 
obviously, = 0, and 3) when i e U2, since ^i(x) = A 
and the components in /72 scale uniformly, then x[ = Xi. 
Thus, when U = Ui, x.' = 5c and x is a fixed point of 
P,(x). 

Now we prove that U = Ui. After sorting the compo- 
nents of y = X ^(x) in descending order, all elements 
in /7i are at the front, followed by the elements in U2, 
and by the elements in U3. According to Alg. 1, we need 
to prove two inequalities: 

..,„.,(! -dt/.l-l).)^^^ (13) 



En 
i=|C/i| Vs 



Vs 



Ui\ 



a^-\Ui\e) 



En 
i=\Ui\-\-l Vs 



< €. 



(14) 



The first inequality ensures that the |/7i|-th largest com- 
ponent of y is in /7, and the second inequality ensures 
that the + 1-th largest component of y is not in U. 

Since 1 - \Ui\e = Er=|c/i|+i and Vi > \Ui\,ys, = 
\xs,, we get: 

Ur\e) _ Ax,,,^,^,(l-|/7i|£) 



.(1 



En 
i=|f/i| + l Usi 

Thus, (O holds. 
For (p^ , from ^g. 



AEt 



XsiOsA^) and Xs^Ui\ 



— ^S|[/^| + i < ^- 



(15) 
e, we get: 



En 
i=|t/i| Vs 



> £ 



^i=\ui\+i^sigsi{^) 

9s,u. I (x) > = A, 



which is true according to the KKT condition ©. ■ 

When X is a fixed point of the operator ^^(x), is it a 
KKT point of (|3])? This question is complicated, and the 
following theorem partially answers this question. 

Theorem 2. If x is a fixed point of the operator ^^(x), 
then we get: 



= A, {) < Xi < s] 
> A, Xi — e. 



(16) 



where A is a constant. 



Proof: Let Ui = {i\xi = e}, U2 = {i\xi e (0,6:)}, U3 = 
{i\xi = 0} and x' = ^^(x). Since x is a fixed point of the 
operator Psi'x.), x.' = x. According to Alg. U = Ui and 
V = U2U U3. 

We first prove that ^i(x) = A when < Xi < e, that 
is, when i e 1/2- For any i^j e U2, since the projection 
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proj^ (y) does not change the relative scales of compo- 
nents'in V, then 4 = ^ = 1^^. Since 4 = 1^, then 

Uj Xjgj{x.) Xj Xj 

gi{St.) = gj{St.). That is, the partial derivatives with respect 
to all variables in U2 are the same, and we denote them 
by A. 

Now we prove that ^i(x) > A when Xi = e, that is, 
when i e Ui. Suppose Xi is the element in /7i with the 
smallest partial derivative, according to Alg. 1, we have: 



Algorithm 2 Path Following Replicator Dynamic 



Recall that i/i 
we get: 



y,{l-{\U,\-l)e) 
£5i(x) and Y.j(.u^ Vj 



> £. 



(1- 



(17) 

\Ui\e)K then 



Vi 



> e 



jeU2 



U,\e)>eYyj 



jeU2 



Since Xi is the element in /7i with the smallest partial 
derivative, for any z G /7i, ^i(x) > A. ■ 

Note that (O is part of the KKT condition (|6]), and 
the only difference between (p!6)l and ^ is the condi- 
tion on zero components of x. The KKT condition ^ 
requires that the partial derivatives with respect to zero 
components should be not larger than A; however, when 
X is a fixed point of the operator ^^(x), the partial 
derivatives with respect to zero components can have 
arbitrary values. This is not surprising, since in the fixed 
point iteration ([TTJ, when Xi{t) = 0, then Xi{t') = for all 
f > t, no matter how large the partial derivative gi{t') 
is. In fact, this is a common characteristic of both ([TTJ 
and discrete replicator iteration 0. 

In our approach, the initial initialization, that is, the 
initialization for the optimization problem ^ with e = 
si, is always x(0) = {^,...,^}. Theoretically, when 
graph G does not contain isolated vertices, all compo- 
nents of X are always positive, although many compo- 
nents approach zeros. Due to arithmetic underflow, some 
components will become zeros in the iteration process; 
however, this is because their partial derivatives are very 
small. Thus, the fixed points of the operator ^^(x) are 
usually KKT points of ©. In this evolution process, 
whether there is an exception or not, that is, whether 
there is a fixed point of the operator ^^(x) that is not 
a KKT point of ©, is an open problem. At least for 
the discrete replicator dynamic, if the initialization is 
x(0) = {^,...,^}, then it always evolves to a local 
maximizer of which is a KKT point of 

In conclusion, a KKT point of © is always a fixed 
point of the operator Pe{^), while the fixed point of 
(|3| in our proposed approach is usually a KKT point 
of (|3|, although in theory, exception may exist. Besides, 
according to our experiments, the obtained KKT point 
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Input: W and {^i, . . . , Sm}- 
Set Xinit = {-,..., -}. 
for i = 1, . . . , m do 

Set e = Si, x(0) = Xi^^it and t = 0. 
repeat 

t = t + 1; 

x(t)=P,(x(t-l)); 
until |x(t) -x(t- 1)1 < 82 
Set Xinit = x(t) and x(i) = x(t). 
end for 

Output: The solution sequence {x(l), . . . ,x(m)}. 



is usually a local maximizer of ©. Thus, we can get the 
solution of © with very high probability by the fixed 
point iteration (pj} . 

3.4 Path Following Replicator Dynamic 

Based on the fixed point iteration ^H) , we can get the 
solution of ^ for each e. The whole algorithm of path 
following replicator dynamic is summarized in Alg. 2. 

In Alg. 2, we use the solution of © with £ = as 
the initialization of the fixed point iteration (pT) with 
s = SiJ^i. The fixed point iteration terminates when the 
change of x is smaller than a threshold 62, which is set 
to 1 X 10~^ in all our experiments. Obviously, the path 
following replicator dynamic is a direct generalization of 
discrete replicator dynamic. When the path parameter 
only has one value, namely, e = 1, the path following 
replicator dynamic reduces to discrete replicator dy- 
namic. 

The output of Alg. 2 is the solutions of with all 
values of e, {x(l), . . . , x(m)}, which correspond to a 
series of clusters, {Cx(i), Cx(2), • • • , C'x(m)}- Intuitively, in 
the evolution process, x follows the solution path of the 
optimization problem ^ with increasing e, this is why 
we called the proposed approach path following replicator 
dynamic. 

Each x(i) represents a dense subgraph of G containing 
at least [^] vertices. When [^] is large, Cst{i) usually 
contains about [^] vertices; when [^] is small, Cx(i) 
may represent a clique (on unweighed graph) or a 
dominant set (on weighted graph) whose size is larger 
than [^]. In either case, Cx(i) is much denser than other 
subgraphs with similar sizes, thus represents important 
pattern underlying the data. Since Cx(^) is a dense cluster, 
the sequence {Cx(i) , Cx(2) , • • • , C'x(m) } can be regarded as 
gradual simplification of graph G, with dense parts of 
graph G retained. If graph G is constructed from feature 
points, then the sequence {Cx(i), Cx(2), • • • , Cx(m)} is in 
fact a shrink process of high-density regions, since dense 
subgraphs generally correspond to high-density regions 
[6|. The task of estimating high-density regions from 
data samples is fundamental in a number of problems, 
such as outlier detection, cluster analysis and one-class 
problem |[18|, |[19|. The advantage of our method is that 
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it can gradually reveal multiple high-density regions of 
various shapes at different scales. 

Fig. 2 demonstrates the evolution process of path fol- 
lowing replicator dynamic on a planar point cloud. There 
are three Gaussian clusters of different densities. The 
bottom middle cluster is the densest, the top right cluster 
is the second densest, and the top left cluster is the least 
dense. Clearly, the evolution process reveals the cluster 
structure in data. As e increases, the detected clusters 
become more and more compact. Background outliers 
are dropped first, then points in the least dense cluster 
are dropped, and then points in the second densest 
cluster are dropped, finally only a compact subset of 
points in the densest cluster remain. From this process, 
we can detect outliers and extract all dense clusters. We 
emphasize here that all useful information for these tasks 
is revealed by a single evolution process. 

3.5 Complexity Analysis and Further Speedup 

In the path following replicator dynamic, the basic oper- 
ation is the iteration x(t + 1) = Ps{^{t)), which includes 
three procedures: 1) calculation of the partial derivative 
^(x), 2) element-wise multiplication x ^(x), and 3) 
truncated simplex projection. The time complexity of 
calculating partial derivative is 0{\V\ + \E\), the time 
complexity of element-wise multiplication is 0(|F|), and 
the time complexity of truncated simplex projection is 
0(|y| log(|F|)), since a sort operation is needed. Thus, 
the time complexity of the iteration x(t + 1) = P£(x(t)) 
is 0{\V\ \og{\V\) + \E\), which is linear in \E\. Obviously, 
the time complexity of path following replicator dynamic 
is low. In fact, it can work on graphs with millions of 
vertices and tens of millions of edges efficiently. 

The iteration x(t + l) = Ps{'K{t)) has a useful property: 
if Xi{t) = 0, then Xi{t') = for all > t. Thus, 
if a component of x(t) becomes zero, we can drop 
the corresponding vertex and operate on the remaining 
graph. In path following replicator dynamic, when G has 
no isolated vertices, theoretically, the components of x(t) 
are always positive; however, many components of x(t) 
approach quickly, and then become due to arithmetic 
underflow. Thus, we can further accelerate Alg. 2 by 
setting very small components of x(t) to zeros. Specifi- 
cally, when a component of x(t) is smaller than a small 
threshold Ss, it will be set as zero and the corresponding 
vertex is dropped. In our experiment, we always set 
Ss = 1 X 10~^^. Note that even if we do not explicitly 
use such approximation method, the computer uses it 
implicitly by means of arithmetic underflow. In our 
experiments, we only use this approximation method 
when graph G is very large, such as web graphs, since 
the approximation has a possibility, although extremely 
small, to introduce errors. 

3.6 Sampling Strategies for Path Parameter 

Obviously, the path parameter plays a crucial role in path 
following replicator dynamic, and it is a tool for us to 



control the evolution process. To obtain the best result, 
we need to have a good sampling of path parameter. 
Generally speaking, a larger number of samples of the 
path parameter, and a better distribution of these sam- 
ples lead to better results. However, too many samples 
will render Alg. 2 inefficient. Compared to the number of 
samples, the distribution of samples is more important. 
A sampling with less samples but good distribution 
is usually better than a sampling with more samples 
but with bad distribution. Basically, we need to balance 
between effectiveness and efficiency. 

In our experiments, many aspects have been consid- 
ered to find a good sample strategy for path parameter. 
First, applications may explicitly require certain number 
of samples of the path parameter. For example, if we 
want to find three dense clusters of graph G, with sizes 
are 100, 200, 300, respectively. Then three samples of path 
parameter, namely, 2^ and j^, are needed. Second, 
the applications may require us to better explore dense 
subgraphs whose sizes are around a specified size. For 
example, to detect outliers in a dataset knowing that 
there are about 10% outliers, we need more samples of 
path parameter around q-^, where n is the number of 
data points. Third, to better suppress the possible dom- 
ination phenomenon, there should be no large intervals 
in samples. To work out a good sample strategy, we need 
to comprehensively consider all these aspects. 

3.7 Generalization to Hypergraphs 

The path following replicator dynamic can be easily 
generalized to hypergraph and graphs with edges of 
different orders. From graph to hypergraph, the only 
difference is /(x), which becomes a polynomial function. 
Since we do not assume any specific form for /(x) in 
our algorithmic design, the proposed algorithm can be 
directly applied to hypergraphs. Without loss of gener- 
ality, we also assume the weights of hyperedges are all 
positive, thus all coefficients of /(x) are positive. 

4 Experiments 

The proposed path following replicator dynamic is a 
versatile tool for many tasks. First, it is a generalization 
of discrete replicator dynamic that has a much higher 
probability to evolve to the global maximum of (0. Thus 
it can replace the discrete replicator dynamic in many 
applications, and usually lead to better performance. 
Second, it can partially control the sizes of extracted 
clusters, thus have much more flexibility than discrete 
replicator dynamic. Many applications, such as densest 
/c-subgraph problem, constrained clustering |[16| and 
dense neighborhood |T7|, require to extract clusters of 
certain sizes. Third, it essentially reveals the cluster 
structure of a graph. In the evolution process, outliers are 
dropped first, then inliers in less compact clusters, finally 
only vertices in a very compact cluster remain. Thus, 
it can naturally be used to extract clusters and detect 
outliers. Note that the main strength of path following 
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Fig. 2. Graphical illustration of evolution process of path following replicator dynamic on a planar point cloud, (a) 
illustrates the point cloud. There are three Gaussian clusters, with 90, 60 and 30 points, respectively. There are also 
180 uniformly distributed outliers, (b)-(h) demonstrate the solution of ^ with e being 3^0' 2I0' 2U0' ife' lio' M ^' 
respectively. For each x, the components larger than ^^i = 3^ are illustrated in red, and others are illustrated in blue. 
Obviously, from this process, all three clusters can be correctly extracted and outliers are elinimated. 



replicator dynamic is not to precisely partition the data 
into clusters, as most clustering methods do (HI, but to 
globally reveal outliers and clusters. 

In this section, we apply path following replicator 
dynamic to four problems, namely, maximum clique 
problem Q, densest /c-sub graph problem [10], structure 
fitting ||22|, and discovery of high-density regions |[T9l . 
Note that discrete replicator dynamic can be only ap- 
plied on maximum clique problem, among these four 
problems. 

4.1 Robustness Testing On Maximum Clique Prob- 
lem 

In this section, we test the robustness of path follow- 
ing replicator dynamic, that is, its ability to evolve to 
the global maximum of (0. A good test bed is the 
maximum clique problem. According to Motzkin-Straus 
theorem Q, the global maximum of © corresponds 
to the maximum clique on G. Thus, we can run both 
discrete replicator dynamic and path following replicator 
dynamic on the same graph, to see which dynamic has 
a higher chance to find the maximum clique. Of course, 
both dynamics may fail to find the maximum clique, 
since the maximum clique problem is a notoriously hard 
problem |9J. 

To get statistically meaningful results, we need a large 
number of graphs with known maximum cliques. Thus, 
we choose to randomly generate graphs. In our experi- 
ment, graph G is generated in the following way. First, 
generate a clique Gi of mi vertices. Second, randomly 
generate a graph G2. G2 has m2 vertices and Q. ^2(m2-i) 
edges, where a controls the density of this graph. Be- 
sides, G2 needs to follow a certain kind of degree 
distribution. Four kinds of degree distributions, namely, 
uniform distribution (U), binomial distribution (B), geo- 
metric law distribution (G) and power law distribution 
(P), are tested. According to these are the most 
common degree distributions observed in real networks. 
Finally, we randomly add /3mim2 edges between Gi and 
G2, to form the final graph G. Note that when both a and 
/3 are not large, the probability that Gi is the maximum 
clique of G is extremely high. 



TABLE 1 

The results of DRD and PFRD on the maximum clique 
problem, where % denotes the percentage of correctly 
detected maximum cliques. 





DRD 


PFRD($i) 


PFRD($2) 


PFRD($3) 




% 


Time 


% 


Time 


% 


Time 


% 


Time 


u 





0.046 


79 


0.087 


100 


0.151 


100 


0.420 


B 


100 


0.067 


100 


0.137 


100 


0.207 


100 


0.391 


G 





0.074 





0.171 


95 


0.130 


100 


0.337 


P 





0.048 





0.124 


91 


0.134 


100 


0.363 



In our experiment, mi = 100 and m2 = 900, thus, 
G has 1000 vertices in total. We set a = 0.11 and 
j3 = 0.005, to make the average degrees of Gi and 
G2 approximately the same. For each type of degree 
distribution, we randomly generate 100 graphs. For path 
following replicator dynamic, we use three samplings 
of path parameter, that is, $1 = {g^^ sM' • • • ' ik' ^i' 
<|). = I I- J- 1) and = 1^ ^ ^1) 

2 I 950 ' 900 ' • • • ' 50 ' J 3 I 990 ' 980 ' • • • ' 10 ' J " 

Recall that discrete replicator dynamic is a special case 
of path following replicator dynamic, with $ = {1}. 

The experimental results are shown in Table 1. Both 
the percentage of successfully detected the maximum 
cliques Gi and the average running time are reported. 
Clearly, the path following replicator dynamic is much 
more robust than discrete replicator dynamic. Moreover, 
the denser the samples are, the more robust the evolution 
process is. This is consistent with our intuition. For the 
time complexity, generally speaking, more samples lead 
to increased time; however, the run time seems to be not 
linear, but sub-linear, in the number of samples. For the 
random graphs with geometric law degree distribution, 
from to ^2/ the average run time surprisingly drops, 
although <l>2 contains much more samples than <l>i. One 
possible explanation is that as the number of samples 
increases, the number of fixed point iterations under 
each value of e reduces. 

4.2 Densest /c-Subgraph Problem 

As maximum clique problem, the densest /c-subgraph 
problem (D/cS) is also a fundamental and notoriously 
hard problem |[10|. For a graph G, the task is to find 
a subgraph G' with k vertices, and the total weight 




Fig. 3. Graphical illustration of the densest A:-subgraphs of the web graph cnr-2000, with k being 10000, 4000, 2000, 
1000, 800, 400, 200 and 100, respectively. Red color represents 1, and white color represents 0. 
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Fig. 4. The results of DkS on 10 large webgraphs. Feige's method is shown in magenta dashed curve, Ravi's method 
is shown in green dotted curve, the truncated power method is shown in blue dashdot curve, and our method is shown 
in red solid curve. This figure is best viewed in color. 



of edges in G' is the maximum among all subgraphs 
of G with k vertices. Algorithms for finding DkS are 
useful tools for many tasks. For example, they have been 
successfully used to select features for ranking ||24|, to 
identify cores of communities 1251 , and to combat link 
spam [26J. 

Since DkS is an NP hard problem, many heuristic- 
based algorithms have been proposed. For general k, 
the algorithm developed by Feige et al. |^| achieves 
the best approximation ratio of O(n^) w^ith e < 1/3. 
Ravi et al. [27l proposed 4-approximation algorithms for 
w^eighted DkS problem on complete graphs for which 
the weights satisfy the triangle inequality. Recently, Yuan 
and Zhang have proposed truncated power method 1281 , 
and achieved the state-of-the-art results on large web 
graphs. In general, however, Khot [29 J showed that DkS 
has no polynomial time approximation scheme (PTAS), 
assuming that there are no sub-exponential time algo- 
rithms for problems in NP. 

Suppose the solution of ^ with e = ^ is x, then 
X has at least k positive components. The k largest 
components in x form a set, denoted by Vk- Recall 
that X represents a cluster and Xi is the probability of 
this cluster containing the vertex Vi, then the subgraph 
induced by Vk is a natural candidate for the densest 
/c-subgraph. An obvious advantage of our approach is: 
we can get the candidates of the densest /c-subgraph for 
many ks in a single evolution process. 



Fig. 3 demonstrates the detected densest /c-subgraphs 
on cnr-2000. In one evolution process, dense subgraphs 
of various sizes are detected. As k decreases, the ob- 
tained dense subgraph becomes denser and denser, fi- 
nally it nearly becomes a whole graph when k = 100. 
Obviously, the evolution process reveals important infor- 
mation of a graph and it is very useful in graph analysis. 

We compare our approach with three methods, 
namely, Feige's method |[10|, Ravi's method |[10|, and 
truncated power method (TP) 1281 . The source codes of 
these three methods are downloaded from wet|l. For our 
approach, it is speeded up by setting Ss = le — 12, that is, 
setting the components of x(t) whose values are smaller 
than le — 12 to be zeros. 

We use 10 web graphs, namely, cnr-2000, in-2004, eu- 
2005, uk-2007-05, indochina-2004:, amazon-2008, Ijournal- 
2008, hollywood-2009, dblp-2010 and dblp-2011. All these 
web graphs are from the WebGraph framework provided 
by the Laboratory for Web Algorithm^. For directed 
graphs, we treat each directed arc as an undirected edge. 
Table 2 lists the statistics of the web graphs used in the 
experiment. 

For each graph, we compute its densest /c-subgraphs 
for 10 values of k, that is, {1000, 2000, . . . , 10000}. Both 
total weights of obtained subgraphs and running time 

2. https:/ / sites.google.com/ site/ xtyuanl980 

3. Datasets are available at |http: / /laeTdsi.unimi.it/ datasets.php] 
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TABLE 2 

The statistics of the web graph datasets. 



Graph 


Vertices( y ) 


Arc(|£;|) 


Average Degree 


LrLi Z.UUU 


OZO, OO ( 


O, ZID, lOZ 


Q 


in-2004 


1,382,908 


16,917, 053 


12.23 


eu-2005 


862,664 


19,235,140 


22.30 


uk-2007-05 


1,000,000 


41,247,159 


41.25 


indochina-2004 


7,414, 866 


194,109,311 


26.18 


amazon-2008 


735,323 


5,158,388 


7.02 


liournal-2008 


5,363,260 


79,023, 142 


14.73 


hollywood-2009 


1,139,905 


113,891,327 


99.91 


dblp-2010 


326, 186 


1,615,400 


4.95 


dblp-2011 


986,324 


6, 707, 236 


6.80 



TABLE 3 

Time used in DkS experiment. The time is measured in 
seconds. 



Graph 


Feige 


Ravi 


TP 


PFRD 


cnr-2000 


1.87 


688.31 


8.77 


10.75 


in-2004 


4.93 


1102.55 


23.75 


17.03 


eu-2005 


4.61 


1268.73 


39.57 


36.49 


uk-2007-05 


16.78 


2719.34 


50.70 


47.06 


indochina-2004 


31.06 


3531.42 


1321.69 


1257.47 


amazon-2008 


2.22 


999.93 


38.41 


12.12 


liournal-2008 


30.67 


1588.95 


258.92 


325.11 


hollywood-2009 


15.41 


1531.01 


187.51 


251.36 


dblp-2010 


1.68 


3516.15 


9.08 


7.38 


dblp-2011 


5.77 


1121.72 


33.58 


37.43 



are reported. To save space, for each method, we only 
report its total time of obtaining 10 subgraphs on each 
web graph. Our method compute all 10 densest k- 
sub graphs of one graph in one evolution process; while 
other methods compute them separately. 

Fig. 4 shows the total weight of dense subgraphs ver- 
sus the cardinality k. From the performance curves, we 
can observe that our approach consistently outperform 
other three methods on all graphs. Truncated power 
method performs well on 5 web graphs, but badly on eu- 
2005 and uk-2007-05. Ravi's method performs worst on 
nearly all web graphs, except for amazon-2008. Besides, 
our method performs extremely well for small k, this 
is because it obtains much more compact clusters than 
other methods. 

The running time is reported in Table 3. Three meth- 
ods, Feige's method, truncated power method and our 
method, are efficient; while Ravi's method is time con- 
suming. Feige's method is the fastest, since it only 
needs a few degree sorting operations. Both truncated 
power method and our method need iterative matrix- 
vector multiplications, and they nearly have the same 
time complexity. Matrix-vector multiplication means the 
time complexity is linear in the number of edges, 
which is still very efficient. For example, on the web 
graph liournal-2008, which has 5,363,260 vertices and 
79, 023, 142 edges, the whole evolution process of our 
approach only costs 325.11 seconds. In fact, it can run 
much faster on a computer with larger memory. This 
is because the main limitation is the memory for large 
graphs. For example, due to insufficient memory, both 



truncated power method and our approach slow down 
on the web graph indochina-2004:. 

4.3 Structure Fitting 

Structure fitting, that is, fitting a geometric model to 
data, is a fundamental task in computer vision [22J. A 
typical example is to fit lines for a given point set. In 
structure fitting, two key questions are: 1) how many 
structures in data? and 2) what are the parameters of 
these structures? In practice, structure fitting is a non- 
trivial task, since real-world data may contain single or 
multiple structures, and may also be contaminated by 
severe noises and large amount of outliers. 

At the year of 1981, Fischler and Bolles proposed 
the seminar work for structure fitting, RANSAC ||22|. 
RANSAC utilizes a "hypothesize-and-verify'' framework 
to detect potential structures. This framework is very 
robust to outliers, which is the main challenge in struc- 
ture fitting problems. After RANSAC, many structure 
fitting methods were proposed ||30|, ||3ll, ||32|. Although 
these methods improve the performance in some aspects, 
basically, they all adopt the "hypothesize-and-verify'' 
framework of RANSAC. 

The "hypothesize-and-verify'' framework generates 
many candidate structures, but does not tell us which of 
them are real structures. In fact, among these candidate 
structures, most are fake structures, and some candidate 
structures correspond to the same real structures. Thus, 
we need to eliminate duplications and select real struc- 
tures according to a certain criterion, or alternatively, 
estimate the number of real structures. In this direction, 
three representative works are J-linkage ||33|, KF (SlI and 
RCG IHHI. In J-linkage [33J, a "conceptual representa- 
tion'', essentially a reduction of the parameter space to 
a one-dimensional discrete space of hypothesis index, is 
proposed. Robust fitting then proceeds by agglomerative 
clustering of the conceptual representations of the data 
points. In KF ||34|, Chin proposed to sort the residuals 
and construct an affinity measure based on the sorted 
residuals. Such affinity measure can be used as kernel 
to estimate the number of clusters 13411 . In RCG, we vir- 
tually construct a hypergraph, called random consensus 
graph (RCG), based on random sampling, then we build 
a binary graph from this virtual graph and obtain real 
structures by detecting dense subgraphs on the binary 
graph. The binary graph has nearly the same cluster 
structure as random consensus graph, but needs much 
less memory and can be constructed more efficiently. 

We first do experiments on random planar point sets. 
The point set is generated as follows: first generate 
3 lines, with each line containing points, then add 
Gaussian noise 7V(0,cr) to these points, finally add Uq 
uniformly distributed outliers to the point set. The whole 
point set is within the region [—1.5, 1.5]^. 

For line fitting, the basic relation is of order three, that 
is, for any three points, we can determine whether they 
are on a line or not. Thus, from the point set, we can 
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Fig. 5. Graphical illustration of path following replicator dynamic on a planar point cloud (e = The point cloud has 
three lines, with each having 100 inliers, and there are also 300 outliers. Red points are inside the detected clusters, 
and blue points are outside the detected clusters. 



build a hypergraph, where each hyperedge is of order 
three. However, it is time consuming to construct the 
hypergraph, and the hypergraph has so many hypereges 
as to fill up all memory quickly Instead, we use the 
method proposed in [35 J to directly construct a binary 
graph, and run path following replicator dynamic on this 
binary graph. 

Fig. 5 illustrates the path following replicator dynamic 
on an exemplar point cloud. In this experiment, = 
100, Ho = 300 cr = 0.01, ^1 = 6^ and 1000 hypotheses 
are generated. For each x, the points in Cx are shown 
in red, others are shown in blue. As the figure shows, 
in the evolution process, outliers are dropped first, then 
points on one line, points on another line, and finally 
only points on the third line remain. Clearly, this process 
is very helpful for us to precisely detect all lines, since 
inliers of different lines are separated, as well as outliers. 

To quantitatively measure the performance of this pro- 
cess, we randomly generate 100 point sets, and calculate 
the average proportion of inliers in the cluster Cx for 
different values of e. For each line, all the points whose 
distances to it are smaller than a = 0.01 are considered 
as inliers. The union of inliers of all three lines form 
the ground truth set GT. For each x, we get a set 
Cx = {i\xi > 6^}. Then p = ^^^^^ is defined as 
precision of the set Cst- 

Fig. 6 plots the average precision over 100 experiments 
as a function of k, where k = K Both mean precision 
and one std below the mean are illustrated. When k is 
small, as expected, nearly all points in Cst are inliers. 
As k increases, the precision drops a little, but still very 
high. This is because some points close to the three 
lines in data are considered as outliers. Finally, when 
k is very large, outliers are involved and the precision 
drops fast. The std is always small, which means that in 
the 100 experiments, path following replicator dynamic 



Algorithm 3 Structure Fitting Based on Path Following 
Replicator Dynamic 

1: Input: The cluster sequence {Cx(i), Cx(2), • • • , C'x(m)}/ 

and a deviation threshold S4. 
2: Fit a structure s to Cst{m) and set F = {s}, set 1 = 

3: for i = m — 1,...,1 do 

4: Set3 = Cx(,)A 

5: for Each vertex v e2 do 

6: If the deviation of v to any structure in F is 
smaller than a threshold 64, then 1 = 1 u v and 
2 = 2/v. 

7: end for 

8: if > 2* [^] then 

9: Fit a structure s to the vertices in 2. All the 
vertices in 2 whose deviations to structure s are 
smaller than 64 form a set H. 

10: if IHI > then 

11: Set r = r U {s} and T = T U H; 

12: else 

13: Terminate; 

14: end if 

15: end if 

16: end for 

17: Output: The set of fitted structures, F- 



performs steadily. 

As Fig. 5 shows, the sequence {Cx(i), Cx(2), • • • , C'x(m)} 
gradually reveals the real structures in data, with points 
in the same structure gathering together. Thus, by back- 
tracking the sequence {Cx(i), Cx(2), • • • , C'x(m)}/ we can 
reveal real structures one by one. The algorithm is sum- 
marized in Alg. 3. 

In Alg. 3, F stores all the fitted structures, and 1 
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contains vertices which are inliers of these structures. For 
a structure s, its inliers are the vertices whose deviations 
to it are smaller than a threshold 84^. Since we detect a 
new structure from the vertices in H, we require that 
H contains a sufficient number of vertices. Here the 
threshold 2* [^] is twice the least number of vertices in 
the cluster Cx(m)- If the fit of structure s to the vertices 
in H does not have enough inliers, then the algorithm 
terminates. The same as many structure fitting methods, 
Alg. 3 is controlled by two parameters, the deviation 
threshold 6/^ and the minimal number of inliers in a 
structure, [^1- Note that duplications are automatically 
eliminated in the backtracking process of Alg. 3. 

We compare Alg. 3 with three methods, namely, J- 
Linkage, KF and RCG. J-linkage partitions data into 
many clusters, and large clusters are regarded as real 
structures. For KF, it has no parameter, since it can 
automatically estimate the number of clusters by some 
heuristic rules; however, the estimated number may 
be incorrect sometimes, this is because estimating the 
number of clusters in data is a notoriously hard problem 
1361 . For RCG, it considers clusters with large objective 
values as real structures. 

We test all four methods on randomly generated point 
sets, under different levels of noises. We keep Ui = 100 
and Ho = 300, and increase the noise parameter a from 
0.01 to 0.08, with step 0.01. For each point set, we 
randomly select 1000 minimal size samples and generate 
1000 hypotheses. For each value of a, we repeat the 
experiments 100 times. The performance is measured 
by average fitting error, that is, the average distance of 
inliers of three lines to their corresponding fitted lines. 
Here inliers mean points whose distances to any of 
three real lines are smaller than <j. For each method, we 
tune its parameters to obtain the best performanc^. The 
results are reported in Table 4, both average fitting error 
and average running time (in parentheses) are reported. 
For KF, the estimated number of structures may be 
incorrect, thus, we also report the times when it correctly 
estimates the number of structures (in square brackets), 
and its average fitting error is averaged only over these 
experiments. Both RCG and our method have much 
better performance, since the structures are estimated 
from points in dense clusters, which are mostly points 
on the real lines. The performance of J-linkage degrades 
quickly as the level of noises increases, probably because 
the obtained cluster is not so compact. For the time 
complexity, KF is most time consuming, since it does 
singular value decomposition on kernel matrix. J-linkage 
is also slow, due to its agglomerative clustering step. 
RCG is the fastest, since its computation is restricted to 
small subgraphs. Our method is only a bit slower than 
RCG. This is because the evolution process operates on 
the whole graph at the first few iterations. 

In Fig. 7, we illustrate the results of line fitting on three 

4. This may be a little unfair for KF method, since it has no parameter 
to tune. 
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Fig. 6. The proportion of inliers in the clusters of path 
following replicator dynamic. Red solid curve illustrates 
the average precision, and blue dotted curve illustrates 
the curve of one std below the mean. 



(a) (b) J-linkage (c) KF (d) RCG (e) PFRD 



Fig. 7. Results of line fitting on three real images. 

real images. In each image, there are multiple lines, and 
the sizes of these lines vary drastically. It is a difficult 
task to detect short lines without false alarm. This is 
because some fake structures are ranked higher than real 
short lines. Thus, for each method, we only illustrate 
the real structures in its top-A: ranked detections, with k 
being 12, 10, 10 for the first, second and third imag^ 
respectively. Note that for some tracks, both of its two 
edges are detected. For clarity, we only illustrate one of 
them. Generally speaking, all four methods successfully 
detect long lines, and the differences lie in the detection 
of short lines. Due to the help of evolution process, our 
method correctly detect most of short lines. KF fails to 
detect all short lines, probably because it regards them 
as outliers, thus estimates the number of clusters incor- 
rectly. Thus, this experiment also shows the difficulty of 
estimating the number of clusters in real data, especially 
for data with multiple scales. 

4.4 Discovery of High-Density Regions 

Give a dataset P with n points, P = {pi,...,Pn}/ 
we can construct a graph G, with the weight Wij = 
exp(— where d{pi,pj) represents the distance 
between pi and pj, and h is the bandwidth parameter. As 
mentioned before, the path following replicator dynamic 

5. For KF method, we report the real structures in all of its detections. 
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TABLE 4 

Experimental results on line fitting. In each cell, the top 
value is the average fitting error, the bottom value is the 
average run time, measured in seconds. 



0.01 



J-Linkage 



9.449e - 4 
(0.5132) 



0.0037 
(0.5607) 



0.0065 
(0.5186) 



0.0146 
(0.5860) 



0.0178 
(0.5339) 



0.0217 
(0.5678) 



0.0245 
(0.5882) 



0.0372 
(0.5417) 



9.462e - 4[83] 
(0.7386) 



0.0031 [67] 
(0.7785) 



0.0053[75J 
(0.7682) 



0.0097[72] 
(0.7841) 



0.0138[54] 
(0.7771) 



0.0174[52] 
(0.7634) 



0.0196[58J 
(0.7584) 

0.0251 [47J 
(0.7930) 



RCG 



9.634e - 4 
(0.1202) 



0.0025 
(0.1198) 



0.0040 
(0.1172) 



0.0068 
(0.1110) 



0.0076 
(0.1102) 



0.0118 

(0.1151) 



0.0124 

(0.1142) 



0.0155 
(0.1158) 



Our method 



5.585e-4 

(0.1315) 



0.0023 

(0.1386) 



0.0038 

(0.1357) 



0.0067 

(0.1204) 



0.0069 

(0.1143) 



0.0122 
(0.1247) 



0.0129 
(0.1264) 



0.0151 

(0.1253) 







mm 










Fig. 8. The evolution process of path following replicator 
dynamic on Chameleon dataset (37]. The points in clus- 
ters are shown in red; while the points out of clusters are 
shown in blue. Clearly, as e increases, outliers gradually 
disappear and high-density regions emerge. 



on such graph G can be considered as a shrink process of 
high-density regions. In this shrink process, high-density 
regions at different scales, which represent important 
patterns in dataset P, will naturally emerge. 

Fig. 8 demonstrates the evolution process of path 
following replicator dynamic on Chameleon dataset El 
which consists four 2d point sets. These four point sets 
contain high-density regions in complex forms, as well 
as outliers. The second point set contains 10000 points, 
and other three point sets contain 8000 points. In this 
experiment, we set h = 10 and Si = 0.0001. As the 
results illustrate, path following replicator dynamic can 
successfully eliminate outliers and reveal high-density 
regions, despite there are multiple disjoint high-density 
regions and the shape of these high-density regions 
are complex, which is a big challenge to many other 
methods, such as mean shift ||20| and one-class SVM ITSll . 

We also do experiment on a hand pose dataset, which 
is collected by a human-computer interaction software, 
using Micorsoft Kinec£|. The user interacted with com- 
puter by hand poses, with some poses having specific 



TABLE 5 

Experimental results on the hand pose dataset. 





MS 


EC 


1-SVM 


PFRD 


Precision 


31.22% 


75.28% 


72.90% 


85.62% 



meanings, and others being meaningless. The dataset 
contains 12000 instances of hand poses in total. Each 
instance is a 80 x 80 depth image. There are three 
meaningful hand poses, namely, extend, point and fist, 
and each has 2000 instances. The other 6000 instance are 
meaningless, thus are considered as outliers. The task 
is to discover meaningful hand poses, and at the same 
time, identify outliers. Each meaningful hand pose may 
be captured in different viewpoints and distances, thus 
all of its instances form a high-density region of complex 
shape. We compare with three methods, namely, mean 
shift (MS) 1201, ensemble clustering (EC) |[l6l and one- 
class SVM (1-SVM) EHl- For our method, we set ^ = 
{iMoo' • • • ' 6^}- According to x(5) {e = we 
obtain a cluster of 6000 points, then compute its precision, 
which is the proportion of inliers in this cluster. We also 
correctly estimate the number of meaningful poses on 
this cluster by gap statistic Il36l . For mean shift, it detects 
a large number of modes, and in a high-density region, 
there are usually multiple modes. To identify outliers, 
we choose 3 most significant modes, and for each mode, 
we find its 2000 nearest neighbors. In this way, we also 
get a cluster of 6000 points and compute its precision. 
In ensemble clustering, there is a parameter to control 
the size of each detected dense cluster, and we set it 
to 2000. Three significant clusters, with each containing 
2000 points, are detected. We compute the precision of 
the union of these three clusters. For one-class SVM Ei 
it calculates a surface to separate inliers and outliers. 
We adjust its parameter to make the numbers of points 
one both sides of surface are approximately equal, then 
calculate the precision of points on the side classified as 
inliers. The precision of all four methods is reported in 
Table 5. 

As the experimental results show, PFRD significantly 
outperforms the other methods, because it identifies 
outliers by finding globally densest regions. Mean shift 
detects the densest points in feature space, which only 
indicates the existence of high-density regions. Since we 
simply use the distances to these high-density points to 
identify outliers, and the shape of high-density regions is 
complex, mean shift performs badly on this dataset. For 
ensemble clustering, since the least number of vertices 
in a cluster is set to 2000, it detects high-density regions 
in the large scale, thus performs well. Due to the usage 
of kernel trick, one-class SVM can detect high-density 
regions of complex shapes, and it also performs well one 
this dataset. 



6. littp:/ / glaros.dtc.umn.edu/ gkliome/ cluto/ cluto/ download 

7. http://www.microsoft.com/en-us/kinectforwindows/ 



8. we use libsvm: 'http:/ / www.csie.ntu.edu.tw/ cjlin/libsvm| 
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5 Conclusion 

The proposed path following replicator dynamic is a 
generalization of discrete replicator dynamic. The intro- 
duced dynamic path parameter controls the behavior of 
the evolution process. As a result, the evolution process 
is less sensitive to degree distribution and mainly de- 
termined by the global structure of a graph. Due to its 
global awareness, the proposed dynamic can automati- 
cally gather vertices based on their cluster membership. 
This makes it extremely powerful and useful as a general 
tool for discovering the cluster structure of graphs. This 
fact is demonstrated on four different applications. Due 
to its high efficiency, the proposed method can be easily 
integrated into many complex systems as a computation- 
ally cheap but effective module. 
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